{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b3108e3d-dc5b-4f4f-8bd5-88c71e422b38",
   "metadata": {},
   "source": [
    "# Andrews Curves Plot\n",
    "\n",
    "An Andrews Curves plot of 4 features from the penguins dataset to analyze how they are related with the species. We can see, for instance, that Gentoo penguins are quite clearly separated from the two other classes, and that they have consistently larger or higher values across the key features used. Adelie and Chinstrap show moderate overlap. This plot suggests that a classification model (e.g. logistic regression or decision tree) would likely perform well overall."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb7fa534-d6ef-4465-a215-4ea13b5b20f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot\n",
    "import pandas as pd\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "df = hvplot.sampledata.penguins(\"pandas\")\n",
    "df_scaled = df\n",
    "cols = [\"bill_length_mm\", \"bill_depth_mm\", \"flipper_length_mm\", \"body_mass_g\"]\n",
    "scaler = StandardScaler()\n",
    "scaled_features = scaler.fit_transform(df[cols])\n",
    "df_scaled = pd.DataFrame(scaled_features, columns=cols)\n",
    "df_scaled[\"species\"] = df[\"species\"]\n",
    "\n",
    "hvplot.plotting.andrews_curves(\n",
    "    df_scaled,\n",
    "    class_column=\"species\",\n",
    "    samples=30,\n",
    "    title=\"Andrews Curves Plot (Bokeh)\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f09e403-876f-46a2-9cff-5fca4108342a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot\n",
    "import pandas as pd\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "hvplot.extension(\"matplotlib\")\n",
    "\n",
    "df = hvplot.sampledata.penguins(\"pandas\")\n",
    "df_scaled = df\n",
    "cols = [\"bill_length_mm\", \"bill_depth_mm\", \"flipper_length_mm\", \"body_mass_g\"]\n",
    "scaler = StandardScaler()\n",
    "scaled_features = scaler.fit_transform(df[cols])\n",
    "df_scaled = pd.DataFrame(scaled_features, columns=cols)\n",
    "df_scaled[\"species\"] = df[\"species\"]\n",
    "\n",
    "hvplot.plotting.andrews_curves(\n",
    "    df_scaled,\n",
    "    class_column=\"species\",\n",
    "    samples=30,\n",
    "    title=\"Andrews Curves Plot (Matplotlib)\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22add9a1-1460-4cca-bfdb-7082d17112a3",
   "metadata": {},
   "source": [
    ":::{seealso}\n",
    "- [Andrews Curves reference documentation](../../ref/api/manual/hvplot.plotting.andrews_curves.ipynb).\n",
    ":::"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
